Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added 'inline' to content disposition check #3489

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

TaylorN15
Copy link

See discussion here. This adds inline to the allowed list of Content Disposition values when parsing EML files.

@cragwolfe
Copy link
Contributor

CC @MthwRobinson :)

@TaylorN15
Copy link
Author

@scanny @cragwolfe - one of the tests is failing but unrelated to my change. I’ll leave it for you guys to fix?

@MthwRobinson
Copy link
Contributor

@TaylorN15 - Is there a test you could add for this?

@TaylorN15
Copy link
Author

TaylorN15 commented Aug 7, 2024

@TaylorN15 - Is there a test you could add for this?

I checked and there is already a test EML file with inline and None for content disposition within the example docs that gets run during the test 😃

example-docs/eml/email-inline-content-disposition.eml

@MthwRobinson
Copy link
Contributor

Awesome! Would we have expected that test to fail prior to the fix?

@TaylorN15
Copy link
Author

No, it would have passed except the inline content would not have been returned. I thought that was a bit strange that it was in the test file but not catered for in the code.

¯_(ツ)_/¯

@MthwRobinson
Copy link
Contributor

In that case, could we add a test that asserts that the inline content is included now?

@TaylorN15
Copy link
Author

In that case, could we add a test that asserts that the inline content is included now?

It seems the test already has an assertion for content disposition of inline and None and the test passes in the current and this PR version of the code...

@scanny
Copy link
Collaborator

scanny commented Aug 7, 2024

@TaylorN15 here's what we need: A test that fails before your change and passes afterward.

If the current test can be modified to suit, that's fine. I can see the assertions there are pretty weak, maybe you can add something there.

But what I would do is this:

  • take your original problematic email with the MSIP labels, make a copy of it and anonymize it as necessary but keep all the parts. Add that to example-docs/.
  • write a test that does what you were trying to do and fails when run on main, just like the failure you were seeing, like no text content or whatever it was.
  • add that test in this PR and demonstrate that it passes (passing CI is this demonstration).

Basic idea is:

  • if we don't have a test that fails when we take out your change, how do we know your change fixed it?
  • what will defend against regression if no test fails when your change is accidentally removed?
  • Also it gives us a basis for reasoning about whether your change fixed the general-case problem or only an isolated use case.

@TaylorN15
Copy link
Author

@scanny I understand, I'm just not so great at writing tests :)
I need to work out what the difference is between the EML files to understand why one works when the other does not. I am guessing it's something to do with setting process_attachments as I don't do that with my version, but the code that processes attachments includes content disposition of inline and attachment

@scanny
Copy link
Collaborator

scanny commented Aug 8, 2024

@TaylorN15 why don't you post an anonymized version of the problematic message and I'll take a look. I expect I can whip up an appropriate test in a minute or two :)

@TaylorN15
Copy link
Author

@TaylorN15 why don't you post an anonymized version of the problematic message and I'll take a look. I expect I can whip up an appropriate test in a minute or two :)

The original email I linked should suffice. I have just confirmed though that the issue is whether process_attachments is set or not. If true, all the content with a content disposition (including inline) is processed, however in my application I am not processing attachments which is how I first discovered this. The text in my email with the content disposition of inline isn't an attachment though, so I think the assumption that all message body/text has no content disposition is false.

@scanny
Copy link
Collaborator

scanny commented Aug 8, 2024

k, this test should be plenty:

def test_partition_reads_message_part_with_inline_content_disposition():
    elements = partition_email(
        example_doc_path("eml/text-part-marked-inline.eml"), process_attachments=False
    )

    assert len(elements) == 1
    e = elements[0]
    assert e.text.startswith("Hi   Please find attached a project proposal.")
    assert e.text.endswith("Kind regards User  ")

Just name your original file text-part-marked-inline.eml and add it to example-docs/eml/.

You can add this test right after the one test_partition_email_inline_content_disposition().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants